Integration Of Hand-Crafted And Statistical Resources In Measuring Word Similarity

نویسندگان

  • Atsushi Fujita
  • Toshihiro Hasegawa
  • Takenobu Tokunaga
چکیده

This paper proposes a new approach for word similarity measurement. The statistics-based computation of word similarity has been popular in recent research, but is associated with a signi cant computational cost. On the other hand, the use of hand-crafted thesauri as semantic resources is simple to implement, but lacks mathematical rigor. To integrate the advantages of these two approaches, we aim at calculating a statistical weight for each branch of a thesaurus, so that we can measure word similarity simply based on the length of the path between two words in the thesaurus. Our experiment on Japanese nouns shows that this framework upheld the inequality of statisticsbased word similarity with an accuracy of more than 70%. We also report on the e ectivity of our framework in the task of word sense disambiguation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring the Impact of Sense Similarity on Word Sense Induction

Word Sense Induction (WSI) is an unsupervised learning approach to discovering the different senses of a word from its contextual uses. A core challenge to WSI approaches is distinguishing between related and possibly similar senses of a word. Current WSI evaluation techniques have yet to analyze the specific impact of similarity on accuracy. Therefore, we present a new WSI evaluation that quan...

متن کامل

Tagging Unknown Words using Statistical Methods

This paper examines the feasibility of using statistical methods to train a part-of-speech tagger, particularly with respect to unknown words. Training a part-of-speech tagger on a tagged corpus, without incorporating hand-crafted linguistic information, allows that tagger to be used for any language. The use of statistical methods has given encouraging results in experiments performed using th...

متن کامل

A Knowledge-Rich Approach to Measuring the Similarity between Bulgarian and Russian Words

We propose a novel knowledge-rich approach to measuring the similarity between a pair of words. The algorithm is tailored to Bulgarian and Russian and takes into account the orthographic and the phonetic correspondences between the two Slavic languages: it combines lemmatization, hand-crafted transformation rules, and weighted Levenshtein distance. The experimental results show an 11-pt interpo...

متن کامل

Evaluating Lexical Similarity to build Sentiment Similarity

In this article, we propose to evaluate the lexical similarity information provided by word representations against several opinion resources using traditional Information Retrieval tools. Word representation have been used to build and to extend opinion resources such as lexicon, and ontology and their performance have been evaluated on sentiment analysis tasks. We question this method by meas...

متن کامل

A Comparison of Approaches for Measuring Cross-Lingual Similarity of Wikipedia Articles

Wikipedia has been used as a source of comparable texts for a range of tasks, such as Statistical Machine Translation and CrossLanguage Information Retrieval. Articles written in different languages on the same topic are often connected through inter-language-links. However, the extent to which these articles are similar is highly variable and this may impact on the use of Wikipedia as a compar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997